Automatic Term and Collocation Extraction from English-Croatian corpus
نویسنده
چکیده
Term and collocation bases represent valuable additional resources covering specific domain and frequently expressions, which then can be used in further research. The paper presents possible model of building terminology and collocation base, using statistical and linguistic approaches in order to gain experience in building of such resources for the English Croatian language pair. The aim of the paper is not to evaluate tools, but to give an insight into use of tools and to gain experience in building, training and testing of language resources. In the paper, two types of statistically-based term and collocation bases have been compared, created out of the legislative documentation and then filtered through language dependant linguistic patterns.
منابع مشابه
Evaluation of Classification Algorithms and Features for Collocation Extraction in Croatian
Collocations can be defined as words that occur together significantly more often than it would be expected by chance. Many natural language processing applications such as natural language generation, word sense disambiguation and machine translation can benefit from having access to information about collocated words. We approach collocation extraction as a classification problem where the ta...
متن کاملAutomatic Corpus-Based Extraction of Chinese Legal Terms
This paper reports on a study involving the automatic extraction of Chinese legal terms. We used a word segmented corpus of Chinese court judgments to extract salient legal expressions with standard collocation learning techniques. Our method takes the characteristics of Chinese legal terms into account. The extracted terms were evaluated by human markers and compared against a legal term gloss...
متن کاملAutomatic Term Extraction from Knowledge Bank of Economics
KB-N is a web-accessible searchable Knowledge Bank comprising A) a parallel corpus of quality assured and calibrated English and Norwegian text drawn from economic-administrative knowledge domains, and B) a domain-focused database representing that knowledge universe in terms of defined concepts and their respective bilingual terminological entries. A central mechanism in connecting A and B is ...
متن کاملExtracting terms and terminological collocations from the ELAN Slovene-English pazrallel corpus
In many scientific, technological or political fields terminology and the production of upto-date reference works is lagging behind, which causes problems to translators and results in inconsistent translations. Experience gained in various projects involving parallel corpora show that automatic extraction of terms and terminological collocations is an achievable goal, however methods and techn...
متن کاملComputational Metalexicography in Practice - Corpus-based support for the . . .
Computational Metalexicography in Practice { Corpus-based support for the revision of a commercial dictionary Abstract In a cooperation between dictionary publishers and computational linguists, raw material for the revision of the German part of a bilingual German ! English dictionary (Langenscheidts Handww orterbuch Englisch, Neubearbeitung 1991) was produced. In a case study, the entries for...
متن کامل